LinkRank: Finding communities in directed networks 
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To identify communities in directed networks, we propose a generalized form of modularity in di- 
rected networks by introducing a new quantity LinkRank, wliich can be considered as tlie PageRank 
of links. This generalization is consistent with the original modularity in undirected networks and 
the modularity optimization methods developed for undirected networks can be directly applied to 
directed networks by optimizing our new modularity. Also, a model network, which can be used as 
a benchmark network in further community studies, is proposed to verify our method. Our method 
is supposed to find communities effectively in citation- or reference-based directed networks. 

PACS numbers: 89.75.Hc, 02.10.Ox, 02.50.-r 




I. INTRODUCTION 



Uncovering the structure of nature is an essential part 
of our effort to understand the world around us 
the same when it comes to the complex network 
which is considered as a simple but powerful representa- 
tion of real-world complex systems. Among the underly- 
ing structures of complex networks, community structure 
is considered to be important since it has proven to be 
strongly related to the dynamics and functions of com- 
plex networks [1, Q. Hence, considerable attention has 
been given from various fields to uncover the community 
structure of networks T, S] . 

Generally, a community is a group of nodes in which 
the nodes are densely inter-connected compared to the 
rest of the network. And, a network is considered to have 
community structure when there are more links placed 
within the communities and fewer links placed between 
the communities. Uncovering the community structure 
in a given network means finding the best community as- 
signment describing the underlying community structure 
well. In order to decide which community assignment 
is better than any other possible assignments, a benefit 
function is required. Modularity, which was proposed by 
Newman and Girvan 0], is one of the most widely used 
benefit functions. Although it has been reported that 
there exist the resolution limit and the bias towards 
balanced partitions [ll|, [l^ , modularity is still considered 
to be an efficient measure of uncovering the community 
structure. 

Even after the modularity is chosen as the benefit func- 
tion, there still lies a difficult problem. Finding the com- 
munity assignment with the highest modularity is not an 
easy task as the exhaustive optimization of modularity is 
usually impossible. In order to overcome this difficulty, 
many methods [H, have been proposed to obtain the 
best approximation of the highest modularity in a rea- 
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sonable time, and most of those methods arc working ef- 
fectively compared to the computing power they require. 
It is important to notice that those methods can only be 
applied to undirected networks, of which links have no 
specific direction, because the definition of modularity 
is limited to undirected ones. However, many complex 
networks in the real-world are directed ones, such as the 
World Wide Web, citation networks, phone call or email 
networks, etc. 

In many directed networks, the direction of a link con- 
tains important information such as asymmetric influ- 
ence or information flow. A link between a pair of nodes 
may represent a fundamentally different dynamics when 
its direction is reversed. Any kind of approach that dis- 
regards the direction of links may fail to understand the 
dynamics and the function of directed networks. Also, 
any kind of community finding approaches may fail to 
detect the communities correctly if the direction of the 
link is not considered properly. Then, there lies the fun- 
damental question of the problem of community identifi- 
cation in directed networks: How should the direction of 
links be considered? This is a question that is not only 
essential to the community identification but also impor- 
tant to the fundamental understanding of the directed 
networks. 

Several recent studies [H, [H, [13, [3 have tried to 
answer this question. However, it is important to no- 
tice that the listed methods do not share a common 
definition of the community structure in directed net- 
works. The method of Newman and Leicht and the 
method of Guimera et al. [l6j have the similar defini- 
tion by which nodes are assigned to the same community 
when the nodes are linked to similar neighbors. The defi- 
nition of community used in those works is different from 
the general definition of community. A fundamentally 
different approach is adopted in the work of Rosvall and 
Bergstrom [17]. They used an information theory based 
method that also does not seem related to the modular- 
ity optimization method. Leicht and Newman [is'l pro- 
posed a method that is different from those previous ones. 
They adapted a generalized modularity [19| to identify 
the community structure in directed networks. Since 
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the generalized modularity is consistent with the origi- 
nal modularity in undirected networks, the advantage of 
this method is apparent: the modularity optimizing algo- 
rithms developed in undirected networks are supposed to 
be applicable to directed networks by this method. How- 
ever, we find that there may exist some limitations in 
this method. In Sec. |lTl it is shown that the generalized 
modularity may not work as they described in Ref. [ist , 
and an alternative meaning of the generalized modularity 
will be discussed. 

In this paper we propose a new generalization of modu- 
larity based on LinkRank, which is a quantity indicating 
the importance of links in directed networks. The defini- 
tion of community is also changed according to this new 
modularity. It will be shown that this definition consists 
well with the old definition of community and it consid- 
ers the links of different direction properly. The applica- 
tion to a model network in Sec. IVl shows that our method 
works effectively in detecting communities. We deal with 
weighted networks in the derivations, since binary net- 
works can be considered a special kind of weighted net- 
works in which the weight of all links is one. 




FIG. 1: The generalized modularity does not distinguish the 
direction of links. Node A, B, A' , B' are four nodes in a 
directed binary network. The out-strength and in-strength 
of those nodes are u;^"' = w^"' = Wb = w^/ = 3, and 
w'a = w% = wir* = Ws"* = 1- The contribution of the link 
between node A and node B is equal to the contribution of 
link between node A' and node B': qab = Qa'b' = 1 ~ 5/A/. 



II. GENERALIZED MODULARITY 

In undirected networks, a well established method to 
find communities is the modularity optimizing method, 
which is finding a good community assignment of net- 
works which maximizes the benefit function named mod- 
ularity 0"'^ [i,[2l- The modularity is defined as 
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where Wij is an element of the weighted adjacency matrix 
which represents the weight of the link between node i 
and j, Wi — ^jWij is the strength of node i, and the 
total strength is 2M = J2iWt = J2i J2j • 

The modularity can be understood as the difference of 
two quantities. The first one is the fraction of links within 
communities, and the second one is the expected value 
of the first one in a network with the same community 
divisions and the same strength sequence but randomly 
connected links. Modularity Q'^'^ approaches 1 when a 
strong community structure is found and approaches 
when the fraction of links within the communities is no 
better than a random case. However, this does not mean 
the maximized modularity of every random network is 
around zero. Some random networks may have very high 
maximized modularity due to fluctuations in the estab- 
lishment of links [2ll . 

Arenas et al. [19| proposed a generalization of modular- 
ity in directed networks by simply replacing the strength 
terms into directional ones. The generalized modularity 
can be described as 
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where Wij represents the weight of link pointing from 
node i to node j, w""* = Wij and = Wij are 
respectively the out-strength and in-strength of node i 
and node j, and the total strength is M — = 

Leicht and Newman [ig] used this new definition of 
modularity to find communities in directed networks, 
both for computer-simulated networks and real-world 
networks. They described the meaning of this generalized 
modularity as follows. For a pair of nodes labeled A and 
_B, when node A has high out-degree and low in-degree 
while B has the reverse, then a directed link connecting 
A and B is more likely to point from A io B than the 
opposite direction. Hence, if a directed link running from 
B io Aia found in a network, it is a bigger surprise than 
a link from A to B. The link from B io A should con- 
tribute more to the modularity since modularity should 
be high for statistically surprising configurations. 

However, the generalized modularity may not work as 
described above. Because S^.c, is equal to (5cj,ci, the 
generalized modularity Q'^ is able to be derived as 
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Considering Wy and Wji are summed together and con- 
trolled by the same 5ci,c- , it is questionable that the gen- 
eralized modularity can distinguish the direction of links. 
Fig. [T] represents a part of a directed binary network. 
Node A and A' have a higher out-degree, while node B 
and B' have a higher in-degree. According to Leicht and 
Newman's explanation, node A and B should be more 
likely to be divided into the same community than node 
A' and B' . However, the contribution of both pairs to 



3 



the generalized modularity are actually equal: 
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where qij = Wij + Wji 

contribution of the link between node i and j to the gen- 
eralized modularity Q'^. Therefore, it is doubtful that the 
generalized modularity work as described above. Then, 
there may arise the following questions: How could the 
generalized modularity identify communities in directed 
networks? And what is the meaning of the generalized 
modularity? 

As explained in the appendix of Ref . [l^ , the relation 
between the generalized modularity Q'^ in directed net- 
works and the modularity Q"'' in undirected networks 
can be expressed as 
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where Q""^ is the modularity of the undirected network 
which is generated from the original directed network by 
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ignoring link directions, and 
strength of node i. 

Hence, the second term in Eq. ([6]) should be the ad- 
ditional information considered in the generalized mod- 
ularity Q''. It is easy to notice that the second term 
would give a positive contribution to Q'^ only when 
and A; are both positive or both negative. Therefore, the 
effect of the second term is putting together a group of 
nodes that have positive net-strength, and another group 
of nodes having negative net-strength. In an extreme 
case, where there is no community structure when the 
link directions arc ignored, the first term Q""* in Eq. ^ 
makes no contribution to the modularity and the second 
term would contribute importantly to the modularity. It 
is obvious that the directed network in this case should be 
divided into two communities by maximizing the gener- 
alized modularity Q'^; the one community contains nodes 
with positive net-strength and the other community con- 
tains nodes with negative net-strength. Example net- 
works presented in Ref. [l^ are similar to the extreme 
case described above, in which no communities can be 
found if the directions of links are ignored. Those net- 
works were divided into two communities: a community 
composed of nodes with positive net-strength and an- 
other community composed of nodes with negative net- 
strength. No effect, such as nodes connected by a link 
of surprising direction are more likely to be in the same 
community, has been considered in this approach. 



III. LINKRANK AND A NEW 
GENERALIZATION OF MODULARITY 

The most important property of the directed network 
is definitely the direction of links. For example, in a di- 



rected network of webpages, a webpage with more incom- 
ing hyperlinks is much more important and more likely 
being visited than a webpage with more outgoing hy- 
perlinks, even if those two pages have the same degree, 
which is the sum of in-degree and out-degree. Further- 
more, a webpage linked by another important webpage 
should be more important than a webpage linked by a mi- 
nor webpage. Therefore, a link from an important page 
should be more important than a link from a minor page, 
i.e. a link from an important page should be more likely 
to be an intra-community link. If one wants to identify 
the communities in a directed network, it is necessary 
to take into account this unique property of the directed 
networks. Actually, there already exists a quantity called 
PageRank that exploits this unique property in directed 
networks. 

PageRank [12, is an analysis algorithm used by 
Google to rank the webpages in the World Wide Web, 
which is a typical directed network. PageRank assigns a 
quantity that indicates the importance of a webpage with 
the thesis that a webpage is important if it is pointed to 
by other important pages. Mathematically, PageRank 
is the probability of a particular page being visited by 
a random surfer who clicks the hyperlinks in webpages 
randomly. The PageRank equation can be described as 



(7) 



where tt^ is the stationary row vector of G called the 
PageRank vector, and each element tt^ is the probability 
that a random walker is going to visit the node i in the 
stationary state. G is called the Google Matrix and it is 
the probability matrix for the random walk process. Each 
element Gij is the probability that a random walker on 
node i moves toward node j in the next random walk 
step. Gij is defined as Gij = Wij/w°^^, where Wij is 
the element of the weighted adjacency matrix in directed 
networks and wf"* is the out-strength of node i. 

In a directed network, there may exist some dangling 
nodes, which is a node with only incoming links, and 
"trap region" , which is a region where the random walker 
can only move in but not move out. In this case, the 
Google Matrix defined as above cannot guarantee the 
existence of the stationary row vector tt-^, because G 
may not satisfy the requirements of the stochastic matrix 
in the Markov process [l^]. To avoid this problem, the 
Google Matrix is actually defined as 
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(8) 



where (1 — a) is the teleportation probability, by which 
the random walker stops following the hyperlinks and 
opens a random webpage, and is equal to one only if 
node I is a dangling node; otherwise is zero. The value 
of is set to when = 0. By adding a.^ and 

a to the definition of G, the random walker would not be 
trapped in any part of the network during the random 
walk process. Mathematically speaking, the purpose of 
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this modification is to make the Google Matrix G a com- 
pletely dense, stochastic, and primitive matrix. There- 
fore, there always exists a stationary vector tt-^ for the 
Google Matrix G [Hill]. 

Following the idea of PageRank, we propose a con- 
cept of LinkRank, which indicates the importance of the 
links instead of the importance of nodes in PageRank. 
Similar to the definition of PageRank, LinkRank of a 
particular link should be equal to the probability that a 
random walker follows the link from node i to node j in 
the stationary state. With the definition of tt^ and Gy , 
LinkRank can be simply defined as 



(9) 



where tt^ is the ith element of PageRank vector tt, and 
Gij is the element of Google Matrix G. 

As described in Sec. |TT1 the modularity in undirected 
networks is qualitatively defined as 

Qud _ (fraction of links within communities) 

— (expected value of this fraction), (10) 

where the expected value is calculated in a network with 
the same community divisions and the same strength se- 
quence but randomly connected links. In this paper, we 
propose a new definition of modularity for both directed 
and undirected networks as 

Q^^ ~ (fraction of time spent walking within 
communities by a random walker) 

— (expected value of this fraction). (11) 

Reminding that modularity defines intrinsically commu- 
nities, it is important to notice that the definition of com- 
munity is changed in our method. According to the new 
modularity Q'"", a community is no longer a group of 
nodes in which links are more densely located. Instead, a 
community is a group of nodes in which a random walker 
is more likely to stay. Although this definition seems 
out of nowhere, it will be shown in the following part 
that this definition is consistent with the old one in the 
undirected networks and considers the links of different 
direction properly. 

By using LinkRank, this new definition can be written 
in a mathematical form as 



(12) 



where E(Lij) is the expected value of Ly in the null 
model. In Eq. (|12p . it is easy to notice that the first term 
is the fraction of time spent on walking within commu- 
nities by a random walker since Lij is the probability of 
the random walker following the link from i to j, and 
the second term is the expected value of this fraction. 
Both terms correspond to the first and second terms in 
Eq. ITTI) respectively. 



In order to calculate the expected value of Ly , a null 
model has to be chosen first. In the definition of modu- 
larity in undirected networks, the standard null model is 
chosen as a network that has the same strength (or degree 
in binary networks) sequence as the original network (i.e. 
the expected strength of each node is conserved and the 
links are randomly rewired). In directed networks, how- 
ever, it is not proper to choose the same null model as 
in undirected networks since strength is not directly re- 
lated to the random walk process. Instead, PageRank is 
the intrinsic property of nodes through the random walk 
process. It can be shown that the null model of conserv- 
ing strength sequence does not detect communities as we 
expected, while the null model of conserving PageRank 
sequence does Therefore, we choose a random net- 
work, in which the PageRank sequence is conserved and 
the links are randomly rewired, as the null model to com- 
pare with. 

In this null model, the expected value of Lij can be 
calculated as follows. As defined above, LinkRank Lij 
is the probability that a random walker is moving from 
node i to j in the stationary state, and PageRank tt; is 
the probability that a random walker is visiting node i 
in the stationary state. In order to move from node i to 
j, the random walker would have to visit node i in the 
previous step and to visit node j in the next step. The 
probability of visiting node i is tt^ , and the probability of 
visiting node j in the next step is -Kj because the connec- 
tion between node i and j in the original network is not 
conserved in the null model. Therefore, the probability 
that a random walker moving from node i to j in the null 
model is i^iiTj, which means the expected value of Lij in 
the null model is 

E(Ly) =^»7r,. (13) 
Finally, the modularity in directed networks is 
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Interestingly, our new definition of modularity consists 
well with the old definition of modularity in Eq. pHl) . It 
is well known that when the random teleportation is not 
considered, the PageRank vector tt"'^ in undirected net- 
works satisfies tt"'' = Wi/2M, where Wi is the strength 
of node i, and 2M = y^.- Wj is the total strength of the 
undirected network |24l. 1261. ^ [21] . This means that the 
probability that a random walker visiting node i in the 
stationary state is only related to the local structure of 
node i, instead of being related to the global structure. 
Because every link in the undirected network is a bidirec- 
tional path, there are no dangling nodes or trap regions 
in undirected networks. Therefore, the second term of 
Eq. ([SI) can also be ignored and the Google Matrix G^"^ 
of the undirected network is Gij = Wij/wi. Then the 
LinkRank of the undirected network is 
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The expected value of LinkRank in the undirected net- 
work is 



(a) 



Then, the undirected version of our new modularity is 
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2M ~ 2M2M 



(17) 



which is identical to the definition of modularity in 
Eq. This means that our new definition of 

community — a community is a group of nodes in which a 
random walker is more likely to be trapped in — is consis- 
tent with the old definition of community — a community 
is a group of nodes in which links are more densely lo- 
cated. 

Also, the new modularity in Eq. (|14p has a similar form 
with the one-step stability in Delvenne et al. [ll| . which 
is another work revealing the connection between random 
walk and the modularity. 

An remarkable advantage of our method is that all 
the established optimizing techniques [H, [ij developed 
to maximize the old modularity in undirected networks 
can be applied to our method directly, except a few al- 
gorithms in which some adjustments are needed. For 
example, to apply the eigenvector-based method [31 , a 
small trick introduced in Ref. ^18] is needed to restore 
the symmetry of the modularity matrix. 

It is important to notice that our method cannot be 
applied to all kinds of networks because the direction of 
a link does not have a universal meaning in all kinds of 
directed networks. For example, although a word ad- 
jacency network [l^ and the World Wide Web are both 
directed networks, the meaning of direction is fundamen- 
tally different in those two networks. The direction of 
link in a word adjacency network describes the relative 
position of the linked words in a sentence, while the di- 
rection of link in the World Wide Web indicates the ci- 
tation or reference. Information can spread by following 
the directed links in the latter case. As PageRank can 
be applied to any collection of entities linked with cita- 
tion and reference, our method is supposed to be able to 
detect communities in directed networks based on cita- 
tion and reference. This does not mean that our method 
is limited to linked documents only. As social networks 
such as directed friendship networks, phone call networks 
and email networks can be considered as a general form 
of citation/reference networks, our method could be used 
to detect communities in those networks too. 

The parameter a controls the priority given to the 
network structure as opposed to the teleportation effect. 
When a is close to 1, the random walk process would be 
more dependent on the network structure. Therefore, the 
PageRank, LinkRank and the new modularity would be 
more likely to capture the characteristics of the network 
if a is closer to 1. However, it has been reported that 
PageRank becomes more sensitive to the slight change of 
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Rosvall and Bergstrom; L - 4.13 bits/step 
Leicht and Newman: Q - 0.50 
LinkRank: Q = 0.33 



FIG. 2: (color online) Maximizing our new modularity finds 
the same community assignment as the method proposed by 
Rosvall and Bergstrom does. The weight of the bold links 
is twice the weight of normal links, and the color of a node 
indicates the community that the node belongs to. (a) Com- 
munity assignment given by optimizing the new modularity 
of our method or by optimizing the map equation of Rosvall 
and Bergstrom il7|. The new modularity for this commu- 
nity assignment is Q'"" = 0.42, while the modularity used by 
Leicht and Newman [l^ is Q"* = 0.25. (b) Community as- 
signment given by optimizing the modularity that is used by 
Leicht and Newman. Our new modularity for this assign- 
ment is Q''^ = 0.33, while the modularity used by Leicht and 
Newman is Q"^ = 0.50. 



network structure when a gets closer to 1 [23l|. Consid- 
ering that it is impossible for a network to describe the 
underlying system fully and correctly, PageRank cannot 
capture the characteristics of the network when it is too 
sensitive to the slight change of the structure. A balance 
has to be taken between respecting the network structure 
and reducing sensitivity. The choice of a is an important 
issue to any other random work research on directed net- 
works. Thus, the research on the effect of a requires 
a general discussion over random walk problems in di- 
rected networks. We would like to tackle this problem in 
future studies, but it is beyond the scope of this paper. 
In current status, a = 0.85 seems a good choice since it 
is widely used by other researchers [S^, [H, H^] • 



IV. RELATION WITH OTHER COMMUNITY 
IDENTIFICATION WORKS IN DIRECTED 
NETWORKS 

As described in Sec. [Ill Leicht and Newman [l^ pro- 
posed that the links with opposite directions should be 
considered differently to identify communities correctly 
in directed networks. The way the direction information 
is considered in our new modularity is very similar to 
the way considered in Leicht and Newman's work. Let's 
consider the pair of nodes A and B again. When node A 
has lower PageRank and node B has higher PageRank, 
the LinkRank of the link pointing from A to i? is more 
likely to be lower than the LinkRank of the link from B 
to A. Therefore, node A and node B are more likely to 
be in the same community if the link is pointing from 



6 



node B to node A than if the Hnk is pointing the op- 
posite direction. Thus, the asymmetry effect of the link 
direction, which Leicht and Newman wanted to include 
in their method, is well considered in our new definition 
of modularity in a quite systematic way by applying the 
theory of random walk. 

Also, the work of Rosvall and Bergstrom Jv^ is directly 
related to our work. In their work, they proposed an in- 
formation theory-based method to detect communities in 
directed networks. This method can be briefly described 
as follows. For a particular community assignment of a 
directed network, a node name is assigned to each node 
in the network and a community name is assigned to each 
community in the network. The nodes in the same com- 
munity should have different names to distinguish with, 
and the nodes from different communities may share the 
same names because they can be distinguished by their 
community names. Given the names of nodes and com- 
munities, a description can be assigned to a trajectory of 
a random walk on the network. The description records 
the name of each node being visited, and the name of 
the community which the currently visiting node belong- 
ing to is recorded before the node name only when the 
random walker is coming from a node which belongs to 
another community. Thus, this description is unique to 
each trajectory of random walk. When a random walker 
is more likely to stay within a group of nodes than av- 
erage, dividing this group of nodes into the same com- 
munity will make the description shorter. Therefore, the 
community structure can be identified by minimizing the 
length of trajectory description. 

Although our method seems to have no relevance with 
this method, both methods share the same definition of 
the community structure — a community is a group of 
nodes that a random walker is more likely to be trapped 
in instead of moving out of the group in a few steps. 
The simple directed network composed of sixteen nodes 
in Fig. [5] was originally proposed in the work of Rosvall 
and Bergstrom. The weight of the bold links is twice the 
weight of the other links. As shown in the figure, the 
communities detected by our method are identical to the 
results of Rosvall and Bergstrom, preferring the configu- 
ration with long persistence time. The new modularity 
for the community assignment in Fig. [5] (a) is much larger 
than the modularity in Fig. [5] (b) , while the modularity 
calculated by the method of Leicht and Newman has a 
higher value for the community assignment in Fig.[2](b), 



V. APPLICATION TO A MODEL NETWORK 

The network illustrated in Fig. [3] is a directed model 
network that is designed to verify our method. In this 
network, n directed small rings are embedded on a big 
ring and each small ring is composed of m nodes. The 
weight of the links between small rings is a tunable pa- 
rameter w, while the weight of other links is fixed as 1. 
The small rings are the embedded communities of this 



n small rings 
in total 



m nodes in each 
small ring 



\ 



FIG. 3: A model network that is designed to verify our 
method. This network is a directed network composed of 
n sub- networks, and the sub-networks are embedded on a 
ring structure. Also, each sub-network is a small ring com- 
posed of m nodes. Each small ring has an entrance node, 
which is the node receiving a directed link from upper stream 
ring, and an exit node, which is giving a directed link to the 
down stream ring, and the entrance node and the exit node 
are placed at the opposite side of each other. The direction is 
chosen counterclockwise both in the small ring and in the big 
ring. The weight of link between sub-networks is a tunable 
parameter w, while the weight of link in every sub-network 
is fixed as 1. According to our definition to community in 
directed networks, each small ring should be considered as a 
community as long as w is not significantly large because the 
random walker would be more likely to be trapped in each 
small ring rather than freely moving between the small rings. 



model network. When w is small, it will be difficult for 
the random walker to escape from each small ring, and 
the communities should perfectly overlap with the small 
rings, remembering that we consider the community in 
directed networks as a group of nodes where a random 
walker is more likely to be trapped. And when w gets 
larger, it will become easier for the random walker to 
move out of each small ring, and consequently, the em- 
bedded community structure would be more difficult to 
be identified. When w is large enough, it will be not 
reasonable to identify the small rings as communities. 

If the directions of links are ignored, a random walker 
would be more likely to move out of each small ring than 
in the case when link directions are considered. There- 
fore, a direction-ignoring method would not effectively 
detect community structure of the model network while 
our direction-considering method can detect community 
structure correctly when w is neither too small nor too 
large. To prove this, we have to quantitatively compare 
the identified community assignments with the embed- 
ded community structure. Here, we use the variation of 
information ( VOI) , which is described as a true metric of 
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FIG. 4: (color online) Variation of information and modular- 
ity as the function of w for a model network of m = n = 8. 
(a) Square (circle) symbols represent the VOX between the 
embedded community structure and the community assign- 
ment given by optimizing Q*"* {Q''^)- (b) Square symbols and 
circle symbols represent the highest Q"'* and the highest Q'*^ 
found by the simulated annealing algorithm during the com- 
munity identifying process of both methods. The black dotted 
line, the black solid line, the red dotted line and the red solid 
line are corresponding to the functions in Eq. (I19p . Eq. (|20|l . 
Eq. dsn and Eq. ([2l| separately. 



is small when the community assignments are similar. 

We tested both methods in a model network of 64 
nodes, in which m = 8 and n = 8. Since there arc no 
danghng nodes or trap regions in our model network, the 
teleportation rate (1 — a) is taken as zero. The com- 
munity assignments detected by both methods are com- 
pared with the embedded community structure, and the 
difference is measured by the VOT The results of VOI 
and modularity are plotted in Fig. ID Relatively small 
m and n are chosen in order to correctly find the high- 
est modularity, and simulated annealing algorithm [30|, 
which is an alg orithm showing best performance in the 
benchmark [13|, is chosen as the optimizing algorithm. 

As illustrated in Fig. Ufa), when w is small, the commu- 
nity assignments detected by both methods are identical 
to the embedded structure. However, when w is larger 
than 1.9, the VOI for the direction-ignoring method 
starts to get a non-zero value and becomes larger until 
it fixes at a stationary value. This means that the com- 
munity assignment detected by this method is getting 
more and more different from the embedded community 
structure and finally fixes at a stationary configuration. 
Meanwhile, the community assignment detected by our 
method is identical to the embedded community struc- 
ture in the illustrated range of w. This significant differ- 
ence indicates that a method that ignores the direction of 
links cannot identify the community structure effectively 
in this model network, while our method can detect the 
communities effectively. 

Further investigation on the values of modularity and 
the corresponding community assignments aid the better 
understanding of this model network and our method. 
Fig. [DJb) shows that the modularity values given by both 
methods also show different behaviors as the weight w 
becomes larger. When w is small, the community assign- 
ment detected by the direction-ignoring method is iden- 
tical to the embedded community structure. The mod- 
ularity for this community assignment can be expressed 
analytically as 



the community assignment by Karrer et al. [29|, to com- 
pare the different community assignments. The VOI is 
defined as 



N 



■log. 



2 B 



N 



■log2 



i=l j = l 

(18) 

where A and B are the two community assignments to 
be compared with, C"* and are the total number of 
communities of assignment A and B correspondingly, TV 
is total number of nodes, nf is the number of nodes in ith 
community of assignment A, is the number of nodes 
in jth community of assignment B, and n^^ is the num- 
ber of nodes which are in ith community of assignment 
A and in jth community of assignment B at the same 
time. Generally, the VOI is large if the compared two 
community assignments are significantly different, and it 



1 

n 



(19) 



and the black dotted line in Fig. \Mh) shows the curve 
of this function. When w is larger than 4.3, a station- 
ary community assignment emerges. This community as- 
signment is illustrated in Fig. [51(b). This result is easy 
to understand since the weight w is large now and the 
nodes connected by the inter-ring links are more likely to 
be assigned into the same community. The modularity 
for this community assignment is 



m + w — 2 1 
m + w n' 



(20) 



and the black sohd hne in Fig.dfb) shows the curve of this 
function. In the range of w G (1.9,4.3), the modularity 
values given by simulated annealing are slightly larger 
than the values given by Eq. because there exist 

some transitional community assignments. 



Similar analysis can also be performed to our method. 
For the community assignment that is identical to the 
embedded community structure (Fig. EJa)), the new 
modularity given by our method is 

Q^l^ "^^ + '"^ -1. (21) 
mw + 2m + 2w n 

For the community assignment of Fig.Ol^b), which is the 
stationary community assignment given by the direction- 
ignoring method for large w, the new modularity is 

mn; + 2m-4 1 
mw + 2m + 2w n 

It is easy to notice that Qj^ is always larger than no 
matter what value w takes. This means that no matter 
how large w is, a community assignment as Fig.[5jb) will 
never be detected by our method. 

In Fig. [4Kb), both the results of simulated annealing al- 
gorithm and the analytical functions indicate that, as the 
weight w increases, Q""* is decreasing rapidly while Q'*" is 
decreasing relatively slowly. When w is larger than 1.9, 
the community assignment which gives the highest Q"'' 
is altered from a community assignment which is identi- 
cal to the embedded community structure to a different 
community assignment. Because the nodes connected by 
the inter-ring links are more likely to be assigned into 
the same community in this new community assignment, 
the new assignment favors larger w and Q""* starts to 
increase as w becomes larger. Meanwhile, Q^^ decreases 
continuously as w increases, which is consistent with the 
fact that the community structure becomes weaker when 
w gets larger. 

Both the results of VOI and modularity show that our 
method can correctly and robustly detect the commu- 
nity structure of this model network, while the direction- 
ignoring method cannot. We also performed the same 
analysis to the model network of various values for m 
and n. All the results are qualitatively the same with 
the result of m = n = 8. 



VI. SUMMARY 

In this paper, we have presented a new definition of the 
modularity in directed networks by introducing a new 
quantity LinkRank, which indicates the importance of 
links in directed networks. The new modularity is re- 
lated to the random walk process in the network, and 
the global meaning of our new modularity is the fraction 
of time spent moving within communities by a random 
walker minus the expected value of this fraction. And 
locally, the meaning of the new modularity is that a link 
with higher LinkRank is more likely to be assigned as an 
intra-community link than a link with lower LinkRank. 
The definition of community is also changed, according to 
the change of modularity. In this new definition, a com- 
munity is a group of nodes in which a random walker is 
more likely to stay. 
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FIG. 5: (color online) Community assignments given by 
direction-ignoring method and our method in the model net- 
work of m = n = 8. (a) The community assignment given by 
direction- ignoring method, when w is smaller than 1.9. And 
the community assignment given by our method through the 
illustrated values of w. (b) The stationary community assign- 
ment given by direction-ignoring method, when w is larger 
than 4.3. 



It has been proven that our new modularity is consis- 
tent with the old modularity proposed by Newman and 
Girvan [9]. Also, other methods of community identifica- 
tion are compared with our method. It is shown that the 
method proposed by Rosvall and Bergstrom [13] and our 
method share the same concept of community structure 
in directed networks. A model network is designed to 
verify our method, and this model network can be used 
as a benchmark network in further studies of community 
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identification. As most of the modularity optimization 
methods in undirected methods can be appHed to the di- 
rected networks by optimizing our new modularity, our 
method would be very practical to use to identify com- 
munities in directed networks. 
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